Approximate Furthest Neighbor in High Dimensions

نویسندگان

  • Rasmus Pagh
  • Francesco Silvestri
  • Johan Sivertsen
  • Matthew Skala
چکیده

Much recent work has been devoted to approximate nearest neighbor queries. Motivated by applications in recommender systems, we consider approximate furthest neighbor (AFN) queries. We present a simple, fast, and highly practical data structure for answering AFN queries in high-dimensional Euclidean space. We build on the technique of Indyk (SODA 2003), storing random projections to provide sublinear query time for AFN. However, we introduce a di↵erent query algorithm, improving on Indyk’s approximation factor and reducing the running time by a logarithmic factor. We also present a variation based on a queryindependent ordering of the database points; while this does not have the provable approximation factor of the query-dependent data structure, it o↵ers significant improvement in time and space complexity. We give a theoretical analysis, and experimental results.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fast approximate furthest neighbors with data-dependent hashing

We present a novel hashing strategy for approximate furthest neighbor search that selects projection bases using the data distribution. This strategy leads to an algorithm, which we call DrusillaHash, that is able to outperform existing approximate furthest neighbor strategies. Our strategy is motivated by an empirical study of the behavior of the furthest neighbor search problem, which lends i...

متن کامل

Approximate Furthest Neighbor with Application to Annulus Query

Much recent work has been devoted to approximate nearest neighbor queries. Motivated by applications in recommender systems, we consider approximate furthest neighbor (AFN) queries and present a simple, fast, and highly practical data structure for answering AFN queries in high-dimensional Euclidean space. The method builds on the technique of Indyk (SODA 2003), storing random projections to pr...

متن کامل

When Crossings Count — Approximating the Minimum

We present an (1+ε)-approximation algorithm for computing the minimum-spanning tree of points in a planar arrangement of lines, where the metric is the number of crossings between the spanning tree and the lines. The expected running time of the algorithm is near linear. We also show how to embed such a crossing metric of hyperplanes in d-dimensions, in subquadratic time, into high-dimensions s...

متن کامل

Nearest Neighbor Search using Kd-trees

We suggest a simple modification to the kd-tree search algorithm for nearest neighbor search resulting in an improved performance. The Kd-tree data structure seems to work well in finding nearest neighbors in low dimensions but its performance degrades even if the number of dimensions increases to more than three. Since the exact nearest neighbor search problem suffers from the curse of dimensi...

متن کامل

Hardness of String Similarity Search and Other Indexing Problems

Similarity search is a fundamental problem in computer science. Given a set of points from a universe and a distance measure , it is possible to pose similarity search queries on a point in the form of nearest neighbors (find the string that has the smallest edit distance to a query string) or in the form of furthest neighbors (find the string that has the longest common subsequence with a quer...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015